Introduction

All code associated to this page can be foundhere

Ulcerative colitis (UC) is one of the two main forms of Inflammatory bowel disease, a chronic inflammatory condition where the colon and rectum become inflamed. UC is characterized by the presence of small ulcers in the lining of the colon´s lining, as well as a variety of symptoms, including abdominal pain, diarrhea, or weight loss.

As a disease associated with immune cell responses, UC is characterized by a complex de-regulation of innate and adaptive immune cell activity, that includes as main features the overactivation of neutrophils and a Th2-like profile (e.g. see here).

Whereas an important corpus of data has been collected about the pathology of UC immune responses, key questions, such as the interaction between the gut microbiota and the immune cells, or whether dysfunctions in the epithelial barrier are a cause or a consequence of altered immune responses, remain unanswered.

Transcriptomic analyses of single cells or whole pieces of tissue have helped to clarify these questions (e.g. see here or here). Interestingly, most of these analyses use samples from healthy people as controls, comparing their transcriptomic signatures with those obtained from inflamed tissue in UC patients. However, less attention has been given to alternative study designs, such as using non-inflamed tissue from the same UC patients as a control for the gene expression in inflamed tissue. These alternative approaches might offer potential advantages, for example allowing the evaluation of the relative abundance of specific immune cell types or the evaluation of expression changes in genes related to the integrity of the epithelial barrier.

In this work, I explore these potential advantages by analyzing a bulk-RNAseq performed over non-inflamed as well as inflamed samples obtained from UC patients. The RNAseq was performed by the group of William Gordon and the counts can be found in the Gene Expression Omnibus Series GSE107593.

Raw counts matrix

This is the count matrix with the relevant information. It includes gene symbols and names (columns 1 and 2) as well as the corresponding expression in the 48 samples (columns 3 and subsequent)

Study design (samples and individuals)

For each individual, 4 samples were obtained, 2 of them from an inflamed area and 2 from a non-inflamed area. Each sample was obtained at one out of 6 possible areas: 4 in the colon (ascending, descending, sigmoid or transverse), the rectum, or the whole large intestine.

Patient IDs

157
877
1057
1077
1192
1214
8854
8855
8874
8878
8879
8881

Area from which samples were obtained

Ascending
Descending
Large
Rectum
Sigmoid
Tranverse

Lognormalization of counts

Most variable genes

Mean vs CoV for all genes

Mean vs CoV for most variable genes

Principal component analysis (PCA)

PCA with all genes

Scree plot

PCA Scores

PCA with 500 most variable genes

PCA Scores

Differential expression analysis

GO enrichment up and down-regulated genes

Heatmap top 50 DEGs

Based on p.adj

Z-score norm

Plot single genes

standardGeneric for "plot" defined from package "base"

function (x, y, ...) 
standardGeneric("plot")
<environment: 0x00000183381976f8>
Methods may be defined for arguments: x, y
Use  showMethods(plot)  for currently available ones.

Imputing Immune Cell fractions